Search CORE

57 research outputs found

Computation- and Space-Efficient Implementation of SSA

Author: Korobeynikov Anton
Publication venue
Publication date: 01/01/2010
Field of study

The computational complexity of different steps of the basic SSA is discussed. It is shown that the use of the general-purpose "blackbox" routines (e.g. found in packages like LAPACK) leads to huge waste of time resources since the special Hankel structure of the trajectory matrix is not taken into account. We outline several state-of-the-art algorithms (for example, Lanczos-based truncated SVD) which can be modified to exploit the structure of the trajectory matrix. The key components here are hankel matrix-vector multiplication and hankelization operator. We show that both can be computed efficiently by the means of Fast Fourier Transform. The use of these methods yields the reduction of the worst-case computational complexity from O(N^3) to O(k N log(N)), where N is series length and k is the number of eigentriples desired.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Assessing the Significance of Peptide Spectrum Match Scores

Author: Abramova Anastasiia
Korobeynikov Anton
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)
Publication date: 01/01/2017
Field of study

Peptidic Natural Products (PNPs) are highly sought after bioactive compounds that include many antibiotic, antiviral and antitumor agents, immunosuppressors and toxins. Even though recent advancements in mass-spectrometry have led to the development of accurate sequencing methods for nonlinear (cyclic and branch-cyclic) peptides, requiring only picograms of input material, the identification of PNPs via a database search of mass spectra remains problematic. This holds particularly true when trying to evaluate the statistical significance of Peptide Spectrum Matches (PSM) especially when working with non-linear peptides that often contain non-standard amino acids, modifications and have an overall complex structure. In this paper we describe a new way of estimating the statistical significance of a PSM, defined by any peptide (including linear and non-linear), by using state-of-the-art Markov Chain Monte Carlo methods. In addition to the estimate itself our method also provides an uncertainty estimate in the form of confidence bounds, as well as an automatic simulation stopping rule that ensures that the sample size is sufficient to achieve the desired level of result accuracy

Dagstuhl Research Online Publication Server

Basic Singular Spectrum Analysis and Forecasting with R

Author: Anderson
Anton Korobeynikov
Elsner
Ghil
Golub
Golyandina
Golyandina
Golyandina
Golyandina
Hassani
Jenkins
Keeling
Korobeynikov
Nina Golyandina
Roy
Usevich
Zeileis
Publication venue: 'Elsevier BV'
Publication date: 18/01/2013
Field of study

Singular Spectrum Analysis (SSA) as a tool for analysis and forecasting of time series is considered. The main features of the Rssa package, which implements the SSA algorithms and methodology in R, are described and examples of its use are presented. Analysis, forecasting and parameter estimation are demonstrated by means of case study with an accompanying code in R

arXiv.org e-Print Archive

Crossref

Multivariate and 2D Extensions of Singular Spectrum Analysis with the Rssa Package

Author: Golyandina Nina
Korobeynikov Anton
Shlemov Alex
Usevich Konstantin
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2013
Field of study

Implementation of multivariate and 2D extensions of singular spectrum analysis (SSA) by means of the R package Rssa is considered. The extensions include MSSA for simultaneous analysis and forecasting of several time series and 2D-SSA for analysis of digital images. A new extension of 2D-SSA analysis called shaped 2D-SSA is introduced for analysis of images of arbitrary shape, not necessary rectangular. It is shown that implementation of shaped 2D-SSA can serve as a basis for implementation of MSSA and other generalizations. Efficient implementation of operations with Hankel and Hankel-block-Hankel matrices through the fast Fourier transform is suggested. Examples with code fragments in R, which explain the methodology and demonstrate the proper use of Rssa, are presented

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

Directory of Open Access Journals

Journal of Statistical Software

MetaGT : A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data

Author: Finn Rob
Kale Varsha
Korobeynikov Anton
Lapidus Alla L.
Prjibelski Andrey D.
Shafranskaya Daria
Publication venue
Publication date: 28/10/2022
Field of study

While metagenome sequencing may provide insights on the genome sequences and composition of microbial communities, metatranscriptome analysis can be useful for studying the functional activity of a microbiome. RNA-Seq data provides the possibility to determine active genes in the community and how their expression levels depend on external conditions. Although the field of metatranscriptomics is relatively young, the number of projects related to metatranscriptome analysis increases every year and the scope of its applications expands. However, there are several problems that complicate metatranscriptome analysis: complexity of microbial communities, wide dynamic range of transcriptome expression and importantly, the lack of high-quality computational methods for assembling meta-RNA sequencing data. These factors deteriorate the contiguity and completeness of metatranscriptome assemblies, therefore affecting further downstream analysis. Here we present MetaGT, a pipeline for de novo assembly of metatranscriptomes, which is based on the idea of combining both metatranscriptomic and metagenomic data sequenced from the same sample. MetaGT assembles metatranscriptomic contigs and fills in missing regions based on their alignments to metagenome assembly. This approach allows to overcome described complexities and obtain complete RNA sequences, and additionally estimate their abundances. Using various publicly available real and simulated datasets, we demonstrate that MetaGT yields significant improvement in coverage and completeness of metatranscriptome assemblies compared to existing methods that do not exploit metagenomic data. The pipeline is implemented in NextFlow and is freely available fromhttps://github.com/ablab/metaGT.Peer reviewe

PubMed Central

Helsingin yliopiston digitaalinen arkisto

A novel uncultured heterotrophic bacterial associate of the cyanobacterium Moorea producens JHB

Author: Barb Debby
Cummings Susie L.
Engene Niclas
Gerwick Lena
Gerwick William H.
Glukhov Evgenia
Korobeynikov Anton
Leao Tiago Ferreira
Publication venue: FIU Digital Commons
Publication date: 30/08/2016
Field of study

Background Filamentous tropical marine cyanobacteria such as Moorea producens strain JHB possess a rich community of heterotrophic bacteria on their polysaccharide sheaths; however, these bacterial communities have not yet been adequately studied or characterized. Results and discussion Through efforts to sequence the genome of this cyanobacterial strain, the 5.99 MB genome of an unknown bacterium emerged from the metagenomic information, named here as Mor1. Analysis of its genome revealed that the bacterium is heterotrophic and belongs to the phylum Acidobacteria, subgroup 22; however, it is only 85 % identical to the nearest cultured representative. Comparative genomics further revealed that Mor1 has a large number of genes involved in transcriptional regulation, is completely devoid of transposases, is not able to synthesize the full complement of proteogenic amino acids and appears to lack genes for nitrate uptake. Mor1 was found to be present in lab cultures of M. producens collected from various locations, but not other cyanobacterial species. Diverse efforts failed to culture the bacterium separately from filaments of M. producens JHB. Additionally, a co-culturing experiment between M. producens JHB possessing Mor1 and cultures of other genera of cyanobacteria indicated that the bacterium was not transferable. Conclusion The data presented support a specific relationship between this novel uncultured bacterium and M. producens, however, verification of this proposed relationship cannot be done until the ?uncultured? bacterium can be cultured

Springer - Publisher Connector

PubMed Central

DigitalCommons@Florida International University

Improving Switch Lowering for The LLVM Compiler System

Author: Anton Korobeynikov
Publication venue
Publication date: 01/01/2007
Field of study

Switch-case statements (or switches) provide a natural way to express multiway branching control flow semantics. They are common in many applications including compilers, parsers, text processing programs, virtual machines. Various optimizations for switches has been studied for many years. This paper presents the description of switch lowering refactoring recently made for the LLVM Compiler System

CiteSeerX

Crossref

Solid-state fault current limiter for medium voltage distribution systems

Author: Ishchenko Anton
Ishchenko Dmitry
Korobeynikov Boris
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2003
Field of study

This paper presents a thyristor-controlled fault current limiter for medium voltage distribution systems (6-10 kV). The main goal of the proposed scheme is to alleviate the effect of the fault current on the switchgear and other equipment. The limiter is designed mainly for application in industries with large motors installed, where motor\u27s feeding to the fault is significant. Model of the power system with the limiter has been developed using ATP-EMTF. Results of simulations, showing the efficiency of the limiter applications, are presented in the paper. © 2003 IEEE

Michigan Technological University

Singular spectrum analysis with R

Author: Golyandina Nina
Korobeynikov Anton
Zhigljavsky Anatoly
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This comprehensive and richly illustrated volume provides up-to-date material on Singular Spectrum Analysis (SSA). SSA is a well-known methodology for the analysis and forecasting of time series. Since quite recently, SSA is also being used to analyze digital images and other objects that are not necessarily of planar or rectangular form and may contain gaps. SSA is multi-purpose and naturally combines both model-free and parametric techniques, which makes it a very special and attractive methodology for solving a wide range of problems arising in diverse areas, most notably those associated with time series and digital images. An effective, comfortable and accessible implementation of SSA is provided by the R-package Rssa, which is available from CRAN and reviewed in this book. Written by prominent statisticians who have extensive experience with SSA, the book (a) presents the up-to-date SSA methodology, including multidimensional extensions, in language accessible to a large circle of users, (b) combines different versions of SSA into a single tool, (c) shows the diverse tasks that SSA can be used for, (d) formally describes the main SSA methods and algorithms, and (e) provides tutorials on the Rssa package and the use of SSA. The book offers a valuable resource for a very wide readership, including professional statisticians, specialists in signal and image processing, as well as specialists in numerous applied disciplines interested in using statistical methods for time series analysis, forecasting, signal and image processing. The book is written on a level accessible to a broad audience and includes a wealth of examples; hence it can also be used as a textbook for undergraduate and postgraduate courses on time series analysis and signal processing

CERN Document Server